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ABSTRACT 

The development of economical de novo gene 
synthesis methods using microchip-synthesized 
oligonucleotides has been limited by their high er- 
ror rates. In this study, a low-cost, effective and 
improved-throughput (up to 32 oligos per run) error- 
removal method using an immobilized cellulose col- 
umn containing the mismatch binding protein MutS 
was produced to generate high-quality DNA from 
oligos, particularly microchip-synthesized oligonu- 
cleotides. Error-containing DNA in the initial material 
was specifically retained on the MutS-immobilized 
cellulose column (MICC), and error-depleted DNA in 
the eluate was collected for downstream gene as- 
sembly. Significantly, this method improved a popu- 
lation of synthetic enhanced green fluorescent pro- 
tein (720 bp) clones from 0.93% to 83.22%, corre- 
sponding to a decrease in the error frequency of 
synthetic gene from 11.44/kb to 0.46/kb. In addi- 
tion, a parallel multiplex MICC error-removal strat- 
egy was also evaluated in assembling 11 genes en- 
coding ^21 kb of DNA from 893 oligos. The error 
frequency was reduced by 21.59-fold (from 14.25/kb 
to 0.66/kb), resulting in a 24.48-fold increase in the 
percentage of error-free assembled fragments (from 
3.23% to 79.07%). Furthermore, the standard MICC 
error-removal process could be completed within 1.5 
h at a cost as low as $0,374 per MICC. 

INTRODUCTION 

De novo gene synthesis is playing an increasingly impor- 
tant role in synthetic biology, systems biology and gen- 
eral biomedical sciences (1-9). Currently, synthesis of gene- 
size fragments (500^5000 bp) typically begins with oligonu- 



cleotides (oligos) as building blocks. Oligos are synthesized 
on controlled-pore glass (CPG) followed by an assembly 
step for producing long gene fragments (10,11). The tech- 
nologies for assembling these gene-size fragments into much 
longer synthetic DNA constructs are now fairly mature 
(7,11-15). Thus, the relatively high cost and low- throughput 
oligos synthesis have become the limitation of scaling up 
DNA synthesis. Currently, CPG oligos (100-200 nt) sup- 
plied by vendors generally cost about $0.4CK$1.00/bp 
(16,17) and the throughput is low (from one to 1534 
oligos/batch) (7,18,19). Fortunately, compared to tradi- 
tional CPG oligos, the price of oligos that are synthe- 
sized on microarrays can potentially be much cheaper with 
higher throughput (16,20-25). One million distinct oligos 
can be simultaneously synthesized on a single chip (26,27), 
and in some cases, microarray-based methods can pro- 
vide oligo pools containing about one million 60-mers 
for $600 (16,21,28). However, the utilization of microchip- 
synthesized oligonucleotides (MCp-oligos) in de novo gene 
synthesis has been hindered by several technical bottle- 
necks, (i) Minute quantities of each MCp-oligo (1 ~ 10 fmol 
per sequence) (21). Although a large number of distinct se- 
quences can be simultaneously synthesized on one chip, the 
quantity of each sequence is too low to support subsequent 
applications, (ii) High complexity and high background 
(22-24,29). After cleavage from the microchip, the oligos 
form a large pool containing a huge number of different se- 
quences. The great diversity of these sequences in the pool 
makes subsequent gene assembly more difficult, (iii) Low 
fidelity of the MCp-oligos (30). Fortunately, the problems 
of small oligo quantities and complex oligo pool composi- 
tion can be partially resolved via high-fidelity polymerase 
chain reaction (PCR) amplification and separation of the 
MCp-oligo pool into subpools via addition of primers at 
both ends of the synthesized oligos (16,22). Consequently, 
the oligo quality has become the primary obstacle to de novo 
gene synthesis. 
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The quality of synthetic oligos can be improved by en- 
hancing the efficiency of each synthesis cycle during the syn- 
thesis process and reducing the errors in the synthesized oli- 
gos after synthesis (16,22,29,31-35) (Tables 1 and 2). A new 
chemical strategy that optimizes reagent flows and mini- 
mizes depurination during the oligo synthesis process can 
reduce the corresponding side reactions and consequently 
improve oligo yield and quality (31). However, the qual- 
ity of the oligos still cannot meet the requirement of de 
novo DNA synthesis. High performance liquid chromatog- 
raphy and polyacrylamide gel electrophoresis (PAGE) can 
also be performed to improve the fidelity of synthetic DNA 
(Tables 1 and 2) (22,33-35). Using these methods, approx- 
imately 90% of the impurities of incorrect size can be re- 
moved (29). However, these steps often prove ineffective 
for error removal in oligo pools, especially when the size 
of the oligos does not differ (e.g. substitution errors) or 
when the oligos in the pool initially vary in size. Alterna- 
tively, a hybridization and selection method requiring two 
microchips successfully removed errors in oligos, provid- 
ing a 8.74-fold reduction in the error frequency (22) (Ta- 
ble 2). Since this method was based on hybridization on 
microchips, a quality assessment microchip was necessary, 
which effectively doubles the cost of each synthesis. The 
combination of high-throughput pyro sequencing and oligo 
retrieval-mediated error correction appears to be the most 
efficient method to date (Table 2) (36). This highly paral- 
lel method uses a robotic system to image and directly re- 
cover beads containing sequence-verified oligos based on a 
next-generation pyro sequencing platform. This method im- 
proves the fidelity of MCp-oligos by 500-fold compared to 
the initial oligo pool. However, this process is expensive, as 
it depends on next-generation pyro sequencing reagents and 
instruments. 

On the other hand, error correction of synthetic DNA 
can also be performed after assembly (16,35,37-41). En- 
donucleases, which can recognize the mismatch site of 
DNA, combined with exonucleases have been applied in 
error correction (37) (Tables 1 and 2). Surveyor nucle- 
ase, a commercially available CEL endonuclease that has 
also been successfully used for error correction of synthetic 
genes (40), reduced the error frequency from one error per 
526 bp to one error per 8701 bp (40). Furthermore, the Er- 
rASE kit, another commercially available CEL-based en- 
zyme cocktail that corrects errors in a similar fashion, re- 
duced the error rates of DNA assembled from MCp-oligos 
from one error per 1 500 bp to one error per 70 1 7 bp ( 1 6) (Ta- 
ble 2). However, these enzymatic mismatch cleavage (EMC) 
methods are expensive and time-consuming for scalable 
multi-gene treatments and may decrease the probability of 
generating assembled products due to over-digestion (37). 
Alternatively, mismatch binding proteins have been used to 
remove error-containing DNA. The mismatch binding pro- 
tein MutS can specifically recognize and bind to all pos- 
sible single-base mismatches, as well as 1~5 bases inser- 
tion or deletion loops, with varying affinities and func- 
tions independently of other proteins or cofactors (42,43). 
A Thermus aquaticus MutS protein (TagMutS)-mediated 
error-correction method corrected de novo synthesized gene 
using CPG synthesized oligos at a fidelity of one error per 
10 000 bp (38) (Table 1). Another modified error-correction 



method through consensus shuffling with TaqMutS reduced 
the error frequency of synthetic green fluorescent protein 
(GFPuv) gene by 3.5- to 4. 3 -fold, reaching a final fidelity 
of one error per 3500 bp (39) (Table 1). However, the cur- 
rent MutS-mediated error-correction methods are only val- 
idated for assembled products. Additionally, these methods 
typically use TaqMutS because it is more stable and has 
a lower binding affinity to perfectly matched DNA com- 
pared to Escherichia coli MutS {EcoMutS), but its bind- 
ing affinity to mismatch-containing DNA is lower than that 
of EcoMutS (44,45), reducing the efficiency of these MutS- 
mediated error-correction methods. 

These existing approaches for error correction of DNA 
are not suitable for low-cost and high-throughput error re- 
moval from MCp-oligos that form a complex oligo pool 
consisting of oligos with high error rates and varying 
Tm values (22,36). In this study, a low-cost, effective and 
improved-throughput error-removal method using immo- 
bilized cellulose columns containing a combination of two 
homologs of the mismatch binding protein MutS {EcoMutS 
and TaqMutS) was produced to generate high-quality DNA 
from oligos, especially MCp-oligos. After optimization 
of the method using various MutS-immobilized cellulose 
columns (MICCs) to remove errors from the de novo synthe- 
sized EGFP gene assembled from MCp-oligos, the method 
was further validated for its ability to remove errors from 
a soluble methane monooxygenase (sMMO) gene cluster 
(containing sMMO X, Y, B, Z, D, C, H and G) and the 
epothilone (Epo) A, B and C genes that were synthesized 
de novo from MCp-oligos on a larger scale. 

MATERIALS AND METHODS 

Chemicals and strains 

All chemicals were reagent grade or higher and were pur- 
chased from Sangon Biotech Co. (Shanghai, China) unless 
otherwise noted. All restriction enzymes and T4 DNA lig- 
ase were obtained from Thermo Fisher Scientific Inc. (MA, 
USA). PrimeStar HS DNA polymerase was from TaKaRa 
(TaKaRa Biotechnology (Dalian) Co. Ltd, Dalian, China). 
Pfu DNA polymerase was from Biocolor Bioscience & 
Technology Company (BBST, Shanghai, China). KOD Plus 
DNA polymerase was from TOYOBO (Osaka, Japan). T. 
aquaticus NBRC 103206 was obtained from the NITE Bio- 
logical Resource Center (NBRC, Japan). E. coli DH5a was 
used as a host cell for all DNA manipulations. E. coli BL21 
Star (DE3) (Invitrogen, Carlsbad, CA, USA) containing a 
protein expression plasmid was used to express the recom- 
binant proteins. Luria-Bertani (LB) medium containing 100 
|xg/ml ampicillin was used to cultivate E. coli and to pro- 
duce the recombinant proteins. 

Expression, purification and functional evaluation of the re- 
combinant £c0MutS-CBM3-EGFP (eMutS) and CBM3- 
TatfMutS-EGFP (tMutS) proteins 

The MutS gene from E. coli or T. aquaticus (GenBank: 
EcoMutS, HG738867.1 and TaqMutS, U33117.1) was am- 
plified from the pET32-muts plasmid (a gift from Dr 
Tianyin Zhong) or the T aquaticus genome using corre- 
sponding primers (Supplementary Table SI), respectively, 
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Table 1. The effectiveness of different error- removal methods on CPG-oligos 
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a EMC: enzyme-mediated correction; MMC: MutS-mediated correction. 

b The protein and technology used in the corresponding error correction method. 

c Not available in the literature. 

d Percentage of active clones (contains perfect clones). 



and cloned into the pET-21c vector together with the 
CBM3 (GenBank: HF9 12725.1) and EGFP (GenBank: 
ACX42327.1) genes, which were amplified from the pCG 
plasmid (46) using the primers shown in Supplementary 
Table SI. The resulting plasmids were termed pEcoMutS- 
CBM3-EGFP (Supplementary Figure SI a), which ex- 
presses the EcoMutS fusion protein (eMutS) and pCBM3- 
EGFP-TaqMutS (Supplementary Figure Sib), which ex- 
presses the TaqMutS fusion protein (tMutS). The ex- 
pression plasmids were transformed into E. coli BL21 
Star (DE3). Expression of the MutS fusion protein was 
induced in LB medium containing 1 mM isopropyl-D- 
thiogalactoside (IPTG). Then, the expressed MutS fu- 
sion protein was purified using a Ni-NTA affinity col- 
umn according to the manufacturer's protocol (Qiagen, The 
Netherlands). Both MutS fusion proteins contained CBM3, 
EGFP, MutS and a 6-His tag. The recombinant MutS fu- 
sion proteins could be immobilized on cellulose via CBM3, 
and the protein purification and immobilization of the con- 
structed fusion proteins could be monitored via the fluo- 
rescence of EGFP. Detailed procedures regarding MutS ex- 
pression vector construction and MutS expression and pu- 
rification are described in the Supplementary Data. 

Because regenerator amorphous cellulose (RAC) slurry 
was used to immobilize MutS, the binding ability of RAC 
to MutS was also determined. The maximum adsorption 
of MutS per gram RAC (Amax) and the binding constant 
(Ka) of eMutS and tMutS to RAC were calculated based on 
Langmuir equation and Hanes-Woolf method as described 
previously (47). 



To evaluate the binding properties of £c6>MutS-CBM3- 
EGFP (eMutS) and CBM3-7^MutS-EGFP (tMutS), 10 
oligos were synthesized by Sangon Biotech Co. (Shanghai, 
China) (Supplementary Table S2). These oligos (A1~A5 
and B1~B5) can form all possible single-base mismatches 
through the annealing of two oligos to form heterodu- 
plexes respectively (Supplementary Table S3). According 
to previous studies, the mismatches corresponding to dele- 
tion or insertion errors are preferred over both EcoMutS 
and TaqMutS (44,45). Therefore, the binding of MutS con- 
taining an unpaired T and perfectly matched DNA is ini- 
tially used to determine the nonspecific binding of MutS 
to perfectly matched DNA. MutS-DNA binding reactions 
were performed using various molar ratios between MutS 
(eMutS or tMutS) and the DNA (58 bp homoduplexes and 
unpaired T duplexes) ranging from 0:1 to 40:1. Then, bind- 
ing reactions of eMutS or tMutS to the various mismatched 
DNA sequences were performed to determine the binding 
affinities of MutS to these mismatches. All of the results 
were evaluated via a band-shift assay (48). The details of 
these processes are described in the Supplementary Data. 

Construction and functional evaluation of the MICCs 

To prepare the MICCs, the MutS fusion protein was immo- 
bilized on RAC slurry (46) via CBM3 by mixing the eMutS 
or tMutS fusion protein with the RAC slurry (Supplemen- 
tary Figure S2a) (600 pmol MutS in 500 (jul of the RAC 
slurry (20 mg/ml)) and incubating at room temperature for 
10 min. Then, the MutS-immobilized RAC slurry (1 ml) was 



el02 Nucleic Acids Research, 2014, Vol 42, No. 12 PAGE 4 OF 14 
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a EMC: enzyme-mediated correction; MMC: MutS-mediated correction. 
b The primary tools applied in the corresponding error correction technology. 
c Not available in the literature. 
Percentage of active clones (contains perfect clones). 



added to the chromatography column (diameter x length: 
0.4 cm x 7 cm, Sangon Biotech Co.) up to a length of 2 cm. 
After the slurry settled in the column, 1 ml of binding buffer 
(5 mM MgCl 2 , 100 mM KC1, 20 mM Tris-HCl (pH 7.6) and 
1 mM DTT) was added to wash away the free proteins. 

Due to the differing mismatch binding specificity and 
capacity of eMutS and tMutS, five types of MICCs that 
contained eMutS (eMICC), tMutS (tMICC) or a combi- 
nation of both MutS proteins, termed etMICC (e/tMICC, 
t/eMICC or e+tMICC; differing in the packing mode of 
the two MutS proteins), were constructed to evaluate their 
error-removal ability, as shown in Supplementary Figure 
S2b. In the eMICC and the tMICC, only the corresponding 
MutS was immobilized on the RAC slurry. In the combined 
etMICC column, three types of MICCs, corresponding to 
the three packing modes, were constructed (Supplementary 
Figure S2b). For the e/tMICC, 0.5 ml of tMutS-cellulose 
slurry was packed on the bottom of the column, followed by 



another 0.5 ml of eMutS-cellulose slurry to form a length of 
2 cm. For t/eMICC, the MutS -cellulose slurry was packed 
in the reverse order compared to e/tMICC. For e+tMICC, 
equivalent concentrations of eMutS and tMutS were mixed, 
immobilized on the RAC slurry and packed on the column. 
For the eMICC and the tMICC, the molar ratio of DNA to 
MutS was 1:10 and 1:20, respectively, based on the results 
of the analysis of MutS-DNA binding experiments. For the 
etMICC, the molar ratio of DNA to eMutS and tMutS was 
1:10:10 unless otherwise noted. In these experiments, cellu- 
lose columns of equivalent length (20 mg of the RAC slurry, 
2 cm column length) were used. 

The error-removal ability of these MICCs (eMICC, 
tMICC and etMICC) was evaluated based on the binding 
specificity to heteroduplexes in mixtures of duplex DNA (60 
pmol oligos containing 59 bp heteroduplex DNA (unpaired 
T: '+T') (45 pmol) and 54 bp homoduplex DNA (15 pmol) 
(Supplementary Table S2 and S3)) as described in the Sup- 
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plementary Data. Because of the different sizes of the het- 
eroduplex and homoduplex DNA, the 59 bp heteroduplex 
and the 54 bp homoduplex DNA could be separated, de- 
tected and semi-quantified via PAGE. After error removal, 
the error-depleted DNA (eluted fractions) was collected at 
80 fxl/tube and analyzed via PAGE. 

Design and synthesis of the microchip-synthesized oligonu- 
cleotides 

The MCp-oligo pool (EGFP pool), encoding the egfp gene, 
was used to determine the error-removal ability of each 
MICC and establish the optimal error-removal protocol. 
The 60 oligos of lengths between 69 and 1 18 nt in the EGFP 
pool were separated into six separately amplifiable subpools 
(Supplementary Table S4). Each subpool was defined by 
unique primer binding sites at each terminal of each oligo 
(Supplementary Table S5 and S6 and Supplementary Fig- 
ure S3). The primer binding sites, which contained a Mly 
I restriction site, could be removed by Mly I digestion. Af- 
ter the error-removal efficiency of MICC system was con- 
firmed during the synthesis of EGFP gene described above, 
an additional two sets of 253 MCp-oligos (sMMO) and 640 
MCp-oligos (Epo A, B, C) of lengths from 63 to 129 nt were 
separately designed and synthesized (sMMO and Epo pool) 
on microchips (Supplementary Table S4). The sMMO and 
Epo pools were separated into 57 subpools by adding vari- 
ous pairs of primer binding sites to each oligo subpool (Sup- 
plementary Table S5 and S6 and Supplementary Figure S4). 
The details of the design of these oligos are described in the 
Supplementary Data, and the sequence information regard- 
ing these designed oligos is supplied in the Supplementary 
Data of Sequences. 

Each of the above MCp-oligos, encoding the EGFP gene, 
the sMMO gene cluster or the Epo A, B and C genes, were 
individually synthesized by LC Sciences (Houston, Texas) 
on 4-k microchips using light-directed synthesis methods 
(20). 

Primer removal before gene assembly 

The PCR products of the MCp-oligos were digested using 
Mly I to remove the primer region. Then, the cleaved primer 
sequences were removed using the UNIQ-10 oligonu- 
cleotide purification kit (Sangon, Shanghai, China) accord- 
ing to the manufacturer's instructions. 

Amplification and assembly of oligo pools 

The oligos were released from the microchip surface, form- 
ing an oligo pool that contained a total of ~20 picomoles of 
oligos as previously described (22). This oligo pool was used 
as the PCR template without any additional purification. 
PCRs were performed using KOD Plus DNA polymerase 
and the appropriate primers (Supplementary Tables S5 and 
S6). After the primer regions of the oligos were removed as 
described above, polymerase cycling assembly (PCA) (10), 
ligase chain reaction (LCR) (22) or the combination of these 
two methods (PCA-LCR) was performed to assemble these 
oligos into target fragments. Then, the assembled fragments 
corresponding to each gene, containing overlaps of ~30 bp 
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Figure 1. Schematic representation of removal of error-containing 
oligonucleotides or assembled DNA using a MICC. (a) Microchip- 
synthesized oligos or assembled DNA fragments are amplified via PCR. 

(b) Amplified oligos or DNA fragments are re-annealed to expose errors. 

(c) Re-annealed oligos or DNA fragments are loaded onto a MICC. (d) 
After elution, the error-containing oligos or DNA fragments are retained 
on the column, and the error-free oligos or DNA fragments elute through 
the column and are collected, (e) The collected error-free oligos or DNA 
fragments are amplified via PCR to generate additional material for sub- 
sequent applications. 

between the neighboring fragments, were assembled into 
full-length genes via overlapping extension PCR (OE-PCR) 
(49). The details of oligo amplification and assembly and 
OE-PCR for full-length genes are described in the Supple- 
mentary Data section. 

Error removal using a MICC 

Removal of error-containing DNA was performed using 
a MICC as shown in Figure 1. First, the DNA was re- 
annealed to expose the errors as mismatches. For the re- 
annealing procedure, the DNA samples were diluted in 50 
(jlI annealing buffer containing 10 mM Tris-HCl (pH 7.6), 
50 mM NaCl and 1 mM EDTA to a final concentration 
of 50 ng/jxl (approximately 1 |xM for oligos and 0.3 |xM 
for fragments). Then, the DNA samples were slowly cooled 
from 100°C to 25°C in a water bath. Next, 240 |xl of the re- 
annealed DNA (diluted in binding buffer to 12.5 ng/|xl) was 
loaded on the MICC. The error-depleted oligos were eluted 
in 1 ml of binding buffer and collected in fractions of 80 juul 
per 1 .5 ml Eppendorf tube. The DNA concentration of each 
fraction was quantified via Nanodrop or detected via 6% 
PAGE when the concentration was too low to be quantified 
via Nanodrop. The first several collected fractions that con- 
tained error-depleted DNA served as the templates for the 
subsequent steps. In brief, 0.5 juul of the error-depleted DNA 
was used as the PCR template without any additional pu- 
rification. KOD Plus DNA polymerase was used for oligo 
amplification, and Pfu DNA polymerase was used for frag- 
ment amplification using the appropriate primers (Supple- 
mentary Tables S5 and S6) as described in the Supplemen- 
tary Data section. 

Evaluation of the efficiency of the MICC methods to remove 
errors during de novo EGFP gene synthesis 

After the ability of MICCs to remove error-containing CPG 
oligos was confirmed, unpurified MCp-oligos that could be 
assembled into a 720 bp gene encoding EGFP were cho- 
sen to evaluate the ability of the MICC system to remove 
error-containing DNA during de novo gene synthesis. The 
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error-removal process was performed on the EGFP oli- 
gos as shown in Figure 2. In brief, to improve the oligo 
quantity and reduce the complexity of the MCp-oligo pool, 
the original oligo pool cleaved from microchip was sepa- 
rated into subpools via differential amplifications using var- 
ious primer pairs that were added at the terminals of oli- 
gos contained in each subpool (Figure 2a~c). The oligos 
in each subpool were re-annealed to expose errors (Figure 
2d). Then, the error-containing oligos in each subpool were 
removed using each individual MICC (Figure 2e). After er- 
ror removal, these subpools were amplified via PCR (Figure 
2f). These amplified products were used to assemble specific 
target DNA fragments or genes (Figure 2g and h). Each 
oligo subpool of EGFP was amplified using specific primer 
pairs (Supplementary Tables S5 and S6), re-annealed and 
corrected according to the methods described in the section 
'Error removal using a MICC. 

To evaluate and optimize the error-removal ability of 
the MICC method, three MICCs (eMICC, tMICC and 
etMICC) were individually examined. Firstly, the error- 
removal efficacy of three types of combined etMICCs 
(e/tMICC, t/eMICC and e+tMICC), which differed in the 
packing mode as described above, were compared. Then, 
the error-removal ability of four t/eMICCs containing var- 
ious molar ratios of eMutS and tMutS (1:0.5, 1:1, 1:2 and 
1:3) in cellulose columns of the same length (20 mg cellu- 
lose slurry, 2 cm column length) were investigated (the mo- 
lar ratios of DNA to eMutS and tMutS were 1 : 10:5, 1 : 10: 10; 
1 : 10:20 and 1 : 10:30, respectively). After error removal using 
the MICC, the error-depleted oligos were assembled into 
fragments via the LCR method as described in the Supple- 
mentary Data section. 

To further improve the fidelity of the de novo synthesized 
EGFP genes, another round of error removal using a MICC 
was performed on the fragments (assembled from the error- 
depleted oligo subpool). 

Functional evaluation and sequencing of the synthetic EGFP 
gene 

The error-removal efficiencies of MICCs during de novo 
synthesis of EGFP gene were evaluated via functional 
validation and sequencing. After these EGFP fragments 
(without error removal or with one or two rounds of er- 
ror removal) were collected, they were fused to form full- 
length EGFP gene sequences as described above. Func- 
tional validation of the synthesized EGFP gene sequences 
was performed by counting the number of visible fluores- 
cent colonies via a plating assay (50). In brief, the assem- 
bled EGFP full-length DNA was digested using Nhe I and 
Xho I and ligated to the pET-21c plasmid using T4 DNA 
ligase (NEB) and transformed into E. coli BL21 star (DE3) 
via electroporation (51). After the transformants were cul- 
tivated at 37° C for 10 h on a LB agar plate containing 100 
|xg/ml ampicillin, IPTG (0.1 mM) was sprayed on the sur- 
face of the plate to induce the expression of EGFP. The pro- 
portion of the clones with green fluorescence in the total 
clones (harboring synthesized EGFP genes) roughly indi- 
cated the error-removal efficiency of each MICC system. 

To evaluate the error-removal efficiency via sequencing, 
the assembled EGFP fragments or genes were cloned into 



pMD18-T vectors (TaKaRa Bio, Dalian). Then, the clones 
were randomly selected for sequencing. The sequences of 
the selected clones were aligned with the EGFP DNA se- 
quence using the BioEdit tool (http://www.mbio.ncsu.edu/ 
bioedit/bioedit.html). The results were statistically analyzed 
according to previously reported methods (37,40) to quan- 
titatively determine the error-removal efficiency. 

De novo gene synthesis of the sMMO gene cluster and the 
Epo A, B and C genes 

Before the errors in the MCp-oligos for sMMO gene clus- 
ter and the Epo A, B and C genes were removed, the qual- 
ity of these oligos was determined. After the MCp-oligos 
were amplified directly and cloned into pMD18-T vectors 
(TaKaRa Bio, Dalian), they were sequenced and analyzed 
as described above. Then, larger scale error removal was 
performed during the de novo gene synthesis of sMMO gene 
cluster and Epo A, B and C genes using MCp-oligo pools as 
described in Figure 2. The 57 oligo subpools of the sMMO 
gene cluster and the Epo A, B and C genes were individually 
amplified, re-annealed and error-removed using etMICC. 
The error-depleted oligos were assembled into fragments. 
The errors in some of the fragments were further removed 
using etMICCs. The above error-depleted oligos, assembled 
fragments and genes were randomly selected for sequencing 
and analyzed according to the methods described above. 

Gene expression of the sMMO gene cluster 

The assembled genes of the sMMO gene cluster produced 
as described above were additionally validated via expres- 
sion in E. coli BL21 star (DE3). The error-free sMMO X, 
Y, B, Z, D, C and H genes were individually inserted be- 
tween the Nde I and Xho I sites of the pET21-c vector. Al- 
ternatively, the error-free sMMO G gene was cloned into 
the Nde I and Xho I sites of the pET28-a vector. The result- 
ing sMMO gene expression vectors were transformed into 
BL21 (DE3) cells, cultivated at 37°C in LB medium (con- 
taining 100 |xg/ml ampicillin or 50 |xg/ml kanamycin) and 
induced by IPTG (final concentration: 1 mM) under 37°C 
for 4 h when OD600 reached to 0.6. The cells were harvested 
and then analyzed via 12% SDS-PAGE (52). Because the 
Epo A, B and C genes only constitute a portion of the re- 
combinant epothilone (Epo) synthesis pathway in Strepto- 
myces coelicolor and were synthesized for another labora- 
tory, these genes were only validated via sequencing. 

RESULTS 

Expression and functional evaluation of MutS and the 
MICCs 

The constructed MutS fusion protein was easily expressed 
in E. coli BL21(DE3), and approximately 30 mg (~0.21 
ixmol) of the purified MutS fusion protein (approximately 
95% pure) (Supplementary Figure S5) was typically ob- 
tained from 1000 ml of culture. This amount of the puri- 
fied proteins could support the production of 175 standard 
etMICCs. 

The constructed fusion protein tMutS displayed lower 
nonspecific binding to perfectly matched DNA than eMutS. 
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Figure 2. Schematic representation of error removal of microchip-synthesized oligonucleotides during de novo gene synthesis. (a~c) Oligos are synthesized, 
cleaved and amplified. Specific primers (black, purple or yellow) are added to separate the oligo pool into subpools via PCR. (d) The oligos are re-annealed 
to expose synthetic errors, such as mismatches (black dot), (e) Errors are removed using a MICC. Each subpool is eluted through one MICC. (f) The 
error-depleted subpools are amplified separately, (g) Primers are removed, (h) The DNA is assembled. 
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As shown in Supplementary Figure S6, when the molar ra- 
tio of eMutS to DNA was more than 10:1, the remaining 
yield of perfectly matched DNA was significantly reduced, 
and when the ratio was raised to 20:1, almost all of the 
perfectly matched DNA was bound (Supplementary Fig- 
ure S6a). In contrast, even when the molar ratio of tMutS 
to DNA was more than 20:1, only less than half of the 
perfectly matched DNA was bound (Supplementary Fig- 
ure S6c). Therefore, the molar ratio of MutS to DNA was 
10:1 and 20:1 for eMutS and tMutS, respectively, to avoid 
problematic nonspecific binding. 

Furthermore, the MutS fusion proteins displayed dis- 
tinct binding capacities to various mismatches, and eMutS 
bound to most mismatches more effectively than tMutS. 
As shown in Supplementary Figure S7, both MutS fu- 
sion proteins displayed high binding affinity to most 
deletion/insertion mismatches, and the binding affini- 
ties of eMutS to different substitutions were varied 
(G:T>G:G>C:A, C:C>A:A>G:A, T:C, T:T), whereas 
tMutS displayed similar binding affinity to all substitution 
mismatches. Furthermore, using the optimal molar ratios 
of MutS to DNA, eMutS bound to most mismatches more 
effectively than tMutS (Supplementary Figure S7). 

The Amax of eMutS and tMutS to RAC slurry were 
8.89 jxmol/g and 11.93 |xmol/g, respectively, and the Ka 
of eMutS and tMutS to RAC slurry were 7.71 jxM and 
4.58 fxM, respectively. So during the construction of MICC, 
in which 1.2 nmol of MutS and 20 mg of RAC (1 ml 
of 20 mg/ml RAC) were mixed, theoretically, more than 
99.9% MutS could be immobilized. Actually, when a stan- 
dard MICC was constructed, less than 0.1% MutS in flow 
through and washout fraction could be detected, proving 
that almost all of the added MutS proteins were immobi- 
lized on RAC column. 

The constructed MICCs could functionally retain 
mismatch-containing DNA from a DNA mixture. As 
shown in Supplementary Figure S8, when three-fourths 
of the oligos consisted of mismatch-containing heterodu- 
plexes (59 bp), after elution through the tMICC, only 
the first elution (Elution 2 in Supplementary Figure S8b) 
contained no detectable 59 bp heteroduplexes. In contrast, 
after elution using the eMICC or the etMICC, only the 
perfectly matched 54 bp homoduplex (the error-free DNA) 
was detected. These results indicated that both the eMICC 
and etMICC could effectively retain mismatch-containing 
DNA (even when 75% of the DNA sample consisted of 
mismatch-containing DNA), and the ability of the tMICC 
to retain mismatches was not as effective as that of eMICC 
or etMICC. However, because 54 bp homoduplexes (error- 
free DNA) were visibly detected earlier during the elution 
(Elution 2, Supplementary Figure S8b) using the tMICC, 
the tMICC could still be used for correction processing. 
The recovery efficiency of error-free DNA (the eluted frac- 
tions which only contained the 54 bp homoduplex DNA 
band were considered as error-free DNA recovery) of these 
MICCs were 5.9%, 51.6% and 86.2% for the tMICC, the 
eMICC and the etMICC, respectively. 
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Figure 3. Evaluation of the error-removal ability of various MICCs. (a) 
Functional analysis of the synthesized egfp gene. The ratio of 'fluorescent 
clones' to 'analyzed clones' is calculated as described in the manuscript 
for a series of assays with or without error removal using a MICC. (b) 
Sequencing analysis of the synthesized egfp gene. The error frequencies of 
synthesized genes were analyzed as described in the text for the synthesized 
fragments and genes with or without error removal using a MICC. Then, 
the occurrence of different types of errors was counted, and the error fre- 
quency (errors per kb) of various error- removal protocols was calculated 
as the ratio of each error to the total bases analyzed, t, tMICC; e, eMICC; 
et, etMICC; O, one round of error removal at the oligo stage; O+F, two 
rounds of error removal at both the oligo and fragment stages. 



Error removal using a MICC during EGFP gene synthesis 

The fidelity of the synthetic EGFP gene using MCp-oligos 
without error removal was very poor. As shown in Figure 
3a, only 0.93% of the analyzed clones harboring synthetic 
EGFP full-length gene displayed fluorescence. Seventy- 
three errors were found among the 14 randomly selected 
EGFP fragments or full-length genes (a total of 6379 bp), 
and the error frequency was 1 1.44/kb (Table 3). Almost all 
types of mutations were detected, except for the A/T to C/G 
transition (Table 3). 
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Table 3. Error analysis of synthesized egfp gene sequences with or without MICC-mediated error removal 

Error type Untreated tMICC eMICC etMICC 







One-round a 


Two-round b 


One-round a 


Two-round b 


One-round a 


Two-round b 


Multi-error c 
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0 


0 


2 


0 


0 


0 


Deletion 


24 


11 


5 


7 


4 


8 


2 


A 


5 


1 


0 


0 


0 


0 


0 


C 


8 


2 


0 


1 


0 


1 


0 


T 
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4 
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4 


7 


2 


G 
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1 


1 


0 


0 


0 


Insertion 


6 


0 


0 


0 


0 


0 


0 


A 


2 


0 


0 


0 


0 


0 


0 


C 


2 


0 


0 


0 


0 


0 


0 


T 


1 


0 


0 


0 


0 


0 


0 


G 


1 


0 


0 


0 


0 


0 


0 


Substitution 


39 


29 


33 


3 


2 


3 


1 


Transition 


27 


13 


9 


1 


2 


0 


0 


G/C to A/T 


26 


9 


8 


0 


2 


0 


0 


A/T to G/C 


1 


4 


1 


1 


0 


0 


0 


Transversion 


12 


16 


24 


2 


0 


3 


1 


G/C to C/G 


2 


1 


0 


0 


0 


1 


0 


G/C to T/A 


9 


14 


21 


1 


0 


1 


0 


A/T to C/G 


0 


1 


1 


1 


0 


0 


1 


A/T to T/A 


1 


0 


2 


0 


0 


1 


0 


Total errors 


73 


40 


38 


12 


6 


11 


3 


Bases 


6379 


7909 


7915 


6943 


9356 


8054 


6478 


sequenced 
















Error 


11.44 


5.06 


4.80 


1.73 


0.64 


1.37 


0.46 



frequency 
(errors per kb) 



a One round of error removal at the oligo stage. 

b Two rounds error removal at both the oligo and fragment stages. 

c Error site located in a sequence that contains more than three adjacent consecutive nucleotide errors. 



Error removal using a MICC dramatically reduced the 
error frequency of synthetic EGFP gene. Firstly, the com- 
bination of the two MutS homologs resulted in a higher ef- 
ficiency of error removal using MCp-oligos than that of us- 
ing either MutS homolog alone. As shown in Figure 3a, the 
proportion of fluorescent clones was increased by 18.30-fold 
(from 0.93% to 17.02%), 45.86-fold (from 0.93% to 42.65%) 
and 63. 91 -fold (from 0.93% to 59.44%) after one round 
of error removal at the oligo stage using the tMICC, the 
eMICC and the etMICC, respectively. Moreover, the error 
frequency of the synthetic EGFP gene was reduced by 2.26-, 
6.61- and 8.35-fold (from 1 1 .44/kb to 5.06/kb, 1.73/kb and 
1.37/kb) using the tMICC, the eMICC and the etMICC, 
respectively (Table 3). These results revealed that the one 
round of error correction at the oligo stage using MICCs 
containing a single MutS homolog significantly increased 
the proportion of 'fluorescent clones' among the 'analyzed 
clones', and using a combination of MutS homologs further 
improved the error-correction efficiency. The order of the 
error-correction efficiency was etMICC > eMICC > tMICC. 

Secondly, the packing mode of etMICC displayed no sig- 
nificant effect on the error-removal efficiency, but the mo- 
lar ratio between the two MutS homologs was found to 
influence the error-removal efficiency. As shown in Sup- 
plementary Table S7, after error removal using e/tMICC, 
t/eMICC or e+tMICC, which differed with respect to the 
packing mode, at the EGFP oligo stage, the error frequen- 
cies of the synthetic EGFP gene were decreased to 1. 44/kb 
for the e/tMICC, 1.41/kb for the e+tMICC and 1.21/kbfor 



the t/eMICC (Supplementary Table S7). These results indi- 
cated that all of these etMICCs, which were produced ac- 
cording to different packing modes, significantly improved 
the quality of the assembled gene at a similar efficiency level. 
On the other hand, as shown in Supplementary Table S 8, af- 
ter error correction of the etMICCs that contained different 
eMutS:tMutS ratios (1:0.5, 1:1, 1:2 or 1:3), the error rates 
were decreased to 2.31/kb, 1.21/kb, 1.93/kb and 1.73/kb, 
respectively. These results indicated that the t/eMICC con- 
taining a 1:1 eMutS:tMutS ratio showed the highest effec- 
tiveness. 

Finally, iterated treatment using MICCs further reduced 
the DNA errors. Gene assembly processing is error-prone, 
introducing additional errors into the synthetic DNA. 
Moreover, after one round of MICC error removal at the 
oligo stage, some error-containing oligos might escape and 
appear in the produced DNA constructs. Thus, iterating 
the error-removal process at the assembled fragment stage 
was expected to further improve the fidelity of the syn- 
thetic DNA. To analyze whether repetition of this error- 
removal process at the assembled fragment stage could fur- 
ther reduce the error frequency of the synthetic EGFP gene, 
the error-removal process was performed at both the oligo 
and fragment stages. The errors in the DNA fragments as- 
sembled from the error-depleted oligos were removed us- 
ing the corresponding MICCs once again. Both the func- 
tional assay (fluorescent clone proportion) and the sequenc- 
ing results revealed that this multi-step error-removal pro- 
cess further improved the quality of the synthetic DNA. 
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Based on the functional assay, compared to one round of 
error removal at the oligo stage, the 'fluorescent clones' 
to 'analyzed clones' ratio of the two-round error-removal 
process was further increased by 1.64-fold (from 17.02% 
to 27.84%), 1.49-fold (from 42.65% to 63.36%) or 1.40- 
fold (from 59.44% to 83.22%) after error removal using the 
tMICC, the eMICC or the etMICC, respectively (Figure 
3a). The sequencing results also indicated that the error fre- 
quency of the synthetic EGFP genes was further decreased 
by 1.05-fold (from 5.06/kb to 4.80/kb) using the tMICC, 
2.70-fold (from 1.73/kb to 0.64/kb) using the eMICC and 
2.98-fold (from 1.37/kb to 0.46/kb) using the etMICC (Ta- 
ble 3). After two rounds of error removal, the proportion of 
fluorescent clones was increased by 29.94-fold (from 0.93% 
to 27.84%), 68.13-fold (from 0.93% to 63.36%) and 89.48- 
fold (from 0.93% to 83.22%) using the tMICC, the eMICC 
and the etMICC, respectively (Figure 3a). In addition, the 
error frequencies of the synthetic EGFP gene were reduced 
by 2.38-, 17.88- and 24.87-fold (from 11.44/kb to 4.80/kb, 
0.64/kb and 0.46/kb) using the tMICC, the eMICC and the 
etMICC, respectively (Table 3). Therefore, iterated error re- 
moval using MICCs at both the oligo and fragment stages 
significantly improved the fidelity of synthesized genes. 



Statistical analysis of DNA sequences from de novo EGFP 
gene synthesis 

Error removal using the eMICC or the etMICC signifi- 
cantly improved the probability of obtaining an error-free 
synthetic gene. During de novo gene synthesis, the greatest 
concern is the number of clones that must be analyzed to 
identify at least one error-free sequence. As shown in Fig- 
ure 4, two-round error removal using the eMICC or the et- 
MICC at both the oligo and fragment stages significantly 
improved the probability of generating an error-free clone 
compared to gene assembly using untreated oligos or frag- 
ments. Especially, the number of clones required to be an- 
alyzed to identify one error-free sequence was dramatically 
reduced due to the increased percentage of correct clones 
among the total clones. For example, to obtain one 1 kb 
error-free synthetic gene (probability >90%), two to three 
clones must be screened using the two-round eMICC error- 
removal protocol, and only one to two clones must be an- 
alyzed using the two -round etMICC error-removal proto- 
col. In contrast, without error removal, 47 to 48 clones must 
be analyzed to obtain a 1 kb error-free gene, which means 
a vast waste of materials, time and effort. However, us- 
ing the two-round tMICC error-removal protocol, although 
the fidelity of the synthetic egfp gene was improved from 
11.44/kb to 4.80/kb (Table 3), the probability of synthesiz- 
ing an error-free 1 kb double-stranded product displayed 
only little improvement (Figure 4); i.e. 44 to 45 clones must 
be screened to obtain one 1 kb error-free synthetic gene at 
a probability of >90%. This result may be due to the low 
effectiveness of the tMICC in substitution error removal. 
Although the tMICC can remove most deletion/insertion 
errors (Figure 3b) and can improve the fidelity of synthetic 
DNA, the substitution errors remained in the DNA prod- 
ucts, which resulted in the low probability of obtaining an 
error-free sequence. 



Error removal and gene synthesis of the sMMO gene cluster 
and the Epo A, B and C genes 

The successful generation of a 720 bp egfp gene, along with 
the 24.87-fold reduction of error frequency in the synthetic 
gene, suggested that this method could be applied for er- 
ror removal during de novo gene synthesis on a larger scale. 
This MICC system was further evaluated by the error re- 
moval for the MCp-oligos encoding the sMMO gene cluster 
or the Epo A, B and C genes. In this process, the errors in the 
oligos of each subpool (containing 11~32 distinct oligos) 
were removed using one standard etMICC, and all of the 
error removals for each oligo subpool could be performed 
in parallel. Without error removal, the ratio of error-free oli- 
gos was 32. 1 1%, and this ratio was significantly improved to 
93.04% after one round of error removal using the etMICC, 
corresponding to a reduction in the error frequency from 
12.24/kb to 0.88/kb (Supplementary Table S9). The bind- 
ing abilities of MutS to the DNAs containing various mis- 
matches were different (Supplementary Figure S7). There- 
fore, the amount of oligos bound to MutS present in the 
column is actually sample-dependent. However, the average 
binding ability of MutS to DNAs was obtained through the 
half-quantification of the DNA in eluates (unbound DNA) 
via PAGE (data not shown). About 6^8 pmol of MCp- 
oligos could be recovered after error removal through an 
etMICC, which indicated that 1 .2 nmol of MutS (immo- 
bilized on an etMICC) could effectively bind about 52^54 
pmol of oligos (about 60 pmol of oligos were loaded onto 
etMICC). So, the average amount of oligos bound to MutS 
was about 0.043 mol/mol. 

Next, the 57 error-depleted oligo subpools were fully as- 
sembled into 78 target fragments as shown in Supplemen- 
tary Figure S9. The fidelity of the assembled fragments us- 
ing one round of error correction at the oligo stage was sig- 
nificantly improved by 4.50-fold (error frequency reduced 
from 14.25/kb to 3.17/kb). Furthermore, the error rate was 
further reduced to 0.66/kb after performing another round 
of error removal at the fragment stage. This fidelity improve- 
ment trend was also confirmed by the ratio of correct se- 
quence to the total analyzed sequences. The ratio of error- 
free fragments to all analyzed fragments (~335 bp) was in- 
creased by 1 1 .93-fold (from 3.23% to 38.53%) (Table 4). The 
additional round of error removal at the fragment stage fur- 
ther improved the percentage of error-free fragments (from 
38.53% to 79.07%) (Table 4). 



Gene expression of the sMMO gene cluster 

The eight de novo synthesized genes of the sMMO gene clus- 
ter after MICC error removal, which were codon optimized 
based on E. coli codon usage, were strongly expressed in E. 
coli BL21 (DE3) under the control of the T7/lac promoter. 
There was no shift-frame error in the synthesized genes, and 
the expected size of the eight expressed proteins in the clus- 
ter was detected via PAGE (Supplementary Figure S10). 
However, these genes could not be expressed to form an ac- 
tive complex of sMMO due to the insoluble expression of 
sMMO X and Y. 
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Figure 4. Influence of the error rates on de novo gene synthesis. The number of clones that must be sequenced to identify at least one error-free sequence 
with a high probability (90%) after two rounds error removal at both oligo and fragment stages using various MICCs. 

Table 4. Error analysis of assembled fragment sequences of the sMMO gene cluster and the Epo A, B and C genes with or without error removal using 
the etMICC 



Error type 



Untreated (% a ) 



One-round (%) 



Two-round (%) 



Multi-error b 6(1.34%) 

Deletion 175 (39.06%) 

Insertion 38 (8.48%) 

Substitution 229 (51.1 2%) 

Total errors 448 

Bases sequenced 3 1 445 

Error frequency (error per kb) 14.25 
Percentage of error-free synthetic 3.23 
fragments or genes c (%) 



6 (2.63%) 
61 (26.75%) 
11 (4.82%) 
150 (65.79%) 
228 
71 984 
3.17 
38.53 



0 (0.00%) 
3 (16.67%) 
0 (0.00%) 
15 (83.33%) 
18 

27 357 

0.66 

79.07 



a The ratio of each type of error to the total number of errors. 

b Error site located in a sequence that contains more than three adjacent consecutive nucleotide errors. 

c The length of the synthetic fragments or genes was ~335 bp for the sMMO gene cluster and the Epo A, B and C genes. 



DISCUSSION 

In this study, a high-throughput and cost-effective MICC 
error-removal method was developed, and this method 
could conveniently remove errors from MCp-oligo pools or 
assembled fragments. 

This etMICC system, containing two homologs of im- 
mobilized MutS, was more efficient than the use of a sin- 
gle MutS for error removal. Due to the different bind- 
ing affinities of £c6>MutS and TagMutS to various types 
of errors (44,45), a single MutS (£coMutS or TaqMutS) 
immobilized MICC exhibited bias in binding to various 
types of errors. As shown in Figure 3b and Table 3, the 
tMICC effectively removed insertion/deletion errors but 
was less effective in removing substitution errors. The fre- 
quencies of insertion/deletion and substitution errors after 
two rounds of error removal were reduced from 4.70/kb 
to 0.63/kb and 6.11/kb to 4.17/kb, respectively. In con- 
trast, the eMICC removed both insertion/deletion and sub- 
stitution errors more effectively than the tMICC. The fre- 
quencies of insertion/deletion and substitution errors were 



reduced from 4.70/kb to 0.43/kb and from 6.11/kb to 
0.21 /kb, respectively. Combining these two MutS homologs 
further improved the efficiency of the MICC to remove 
both substitution and insertion/deletion errors, and also 
reduced the influence of biased binding. With two rounds 
of etMICC error removal, the error frequencies were re- 
duced from 4.70/kb to 0.31/kb and 6.11/kb to 0.15/kb for 
the insertion/deletion and substitution errors, respectively. 
Therefore, the insertion/deletion error frequency was re- 
duced by 7.46-, 10.93- and 15.16-fold, and the substitution 
error frequency was reduced by 1 .47-, 29. 10- and 40.73-fold 
for the tMICC, the eMICC and the etMICC, respectively 
(Figure 3b). Consequently, the etMICC was demonstrated 
to be the optimal type of MICC for error removal. 

The MICC method was simpler and more cost-effective 
than EMC methods. Although the Surveyor nuclease- 
mediated EMC method reduced the error frequency of de 
novo synthesized genes using MCp-oligos from 1.9/kb to 
as low as 0.11/kb (21), which was the lowest error rate 
among EMC methods to date (Table 2), there were still sev- 
eral disadvantages compared to this MICC error removal 
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method. In the EMC methods (16,37,40), after digestion 
of mismatch-containing DNA by the endonuclease, sev- 
eral steps, including mismatched nucleotide removal and 
re-assembly, are required to generate full-length error-free 
DNA sequences. In contrast, in the MICC method, MICC 
only bound and retained the mismatch-containing DNA in 
the column and did not destroy the perfectly matched DNA 
structure, so that the full-length error-free sequences were 
maintained. As a consequence, the unbound DNA (error- 
free DNA) could be utilized directly for subsequent appli- 
cations. Compared to the EMC methods, the expensive ex- 
onuclease used for mismatched nucleotide removal and the 
DNA polymerase or ligase used for DNA re-assembly are 
not required in the MICC method. Furthermore, the MICC 
method avoids potential problems such as over-digestion, 
cross-hybridization and misassembly. 

The MICC error-removal method was more convenient 
than previous MutS methods. CBM3, which exhibits high 
affinity to a cellulose (RAC) slurry, is a robust and eas- 
ily accessible molecular tag for protein purification (53,54). 
In this study, after simple mixing, the MutS fusion protein 
was easily immobilized on cellulose via CBM3, forming a 
column that could specifically retain mismatch-containing 
DNA. There were several advantages of this process over 
previous methods, (i) The immobilization is simple and sta- 
ble. The MutS fusion protein can be immobilized on cel- 
lulose by simply mixing them together, and the binding is 
very stable in a hydrophilic solution, (ii) The manipulation 
is easier. In most previously reported MutS-mediated error 
correction methods (38,39), error binding and removal are 
performed separately (errors were bound by MutS in so- 
lution, followed by centrifugation, electrophoresis or col- 
umn separation to remove the DNA-MutS complexes). In 
contrast, the entire MICC error-removal process was per- 
formed on a column. Therefore, the binding and removal of 
the DNA-MutS complexes occured simultaneously. In this 
study, the MICC error-removal procedure, including error 
removal and amplification of error-depleted DNA, could 
be completed within 1.5 h (Supplementary Figure Sll). 
(iii) The efficiency is enhanced (38). Because the MICC is 
similar to an affinity chromatography column, the chro- 
matographic effect of the MICC renders it more effective 
at separating the unbound error-free DNA from the mis- 
matched DNA-MutS complexes. In the previously reported 
MutS-mediated error correction methods (38,39), the er- 
ror binding reactions of error-containing DNA by MutS 
were performed directly in solution. In this study, error 
removal by MutS was also performed on MCp-oligos di- 
rectly in solution, but the result was unsatisfactory (data not 
shown). Because the error rates of the DNA sample were 
higher (for example, three-fourths of the oligos containing 
errors), the error-containing DNA from the escaped DNA- 
MutS complexes significantly reduced the fidelity of the syn- 
thetic DNA, causing poor repeatability. Moreover, as de- 
scribed previously, MutS-mediated error-removal methods 
are more effective for smaller DNA sizes due to the reduced 
incidence of errors per DNA duplex (38,39). In this study, 
the MICC containing MutS was suitable for performing er- 
ror removal at the MCp-oligos stage, significantly reduc- 
ing the oligo (63—129 bp) error frequency, and the error- 
removal efficiency at the oligo stage was higher than that at 



the fragment stage after oligos assembly (258—456 bp) (Ta- 
ble 3, Supplementary Table S10). 

Aside from amplification, common primers can also im- 
prove the efficiency of error removal. In most cases, these 
primers cannot be avoided when using MCp-oligos for 
DNA synthesis as described in the 'Introduction' section. 
The primers were used for oligo amplification and subpool 
separation. In this study, there is another benefit of the 
primers: the common primers can partially avoid the low 
binding affinity of MutS to mismatched sites at the edges 
of DNA duplexes. In previous reports (38), after a MutS 
error-removal process, many errors within 15 bp of these 
edges were retained due to the low binding affinity of MutS 
to the edges of DNA duplexes. However, in this study, no 
bias in the error location was detected, which may be due 
to a benefit of using common primers (>15 bp) (data not 
shown). 

The MICC technology is cost-effective. To construct a 
MICC, the RAC slurry used for MICC production was 
less expensive and more stable than other matrices, such as 
chitin beads. Using 1 g of RAC ($5/g (53)) and 60 nmol of 
MutS fusion protein ($0.65; Supplementary Table Sll), 50 
standard etMICCs could be prepared. In this study, one et- 
MICC could remove errors from one subpool in one batch, 
and the obtained error-free oligos could be used to assem- 
ble one DNA segment 300—400 bp in length. For each 
etMICC, the cost (matrix and MutS protein) was about 
$0,374 (Supplementary Table Sll). Furthermore, using two 
rounds of error removal, the cost of error removal for each 
oligo was as low as $0.0234/oligo ($0,374 x 2/32 oligos) 
or -$0.0016/bp ($0,374 x 2/456 bp) for a final synthe- 
sized DNA sequence. Especially, due to the high probability 
of obtaining correct sequence after etMICC error removal, 
this system could decrease the cost of cloning and sequenc- 
ing to confirm the correct sequences after de novo DNA syn- 
thesis, consequently reducing the cost of DNA synthesis. 

The throughput of the MICC method (throughput: up 
to 32 oligos in each error correction reaction) was also 
higher than previously reported protein-mediated meth- 
ods (typically one fragment per error correction reaction) 
(22,37,39,40). This MICC system provided an improved- 
throughput error correction method for oligo pools. The 
throughput of the MICC system was 1 1—32 distinct oligos 
per MICC treatment, which could be further improved via 
parallel error-removal processing. For example, the error re- 
moval of 57 oligo subpools containing 1 1—32 distinct oligos 
per subpool could be performed in parallel. 

In this study, although the MICC system was only ap- 
plied for de novo gene synthesis based on MCp-oligos, the 
high efficiency and easy operability of this system renders 
method amenable to utilization for other applications. The 
next step in the examination of the MICC method will be 
focused on the scalability of this error correction system to 
a larger scale de novo gene synthesis using MCp-oligos. 
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